Logistic distribution — the “sigmoid” law on ℝ#

The logistic distribution is a symmetric, bell-shaped continuous distribution on the real line whose CDF is the logistic (sigmoid) function. It is closely tied to log-odds (logit) transformations, to logistic regression (as an error model), and it provides a simple, heavier-tailed alternative to the normal distribution.

What you’ll learn#

  • how the PDF/CDF/quantile relate to the sigmoid and logit

  • closed-form moments (mean/variance/skewness/kurtosis), MGF/CF, and entropy

  • parameter interpretation (location \(\mu\), scale \(s\)) and how shape changes

  • NumPy-only sampling via inverse transform + Monte Carlo validation

  • practical usage via scipy.stats.logistic (pdf, cdf, rvs, fit)

import platform

import numpy as np

import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots

import scipy
from scipy import optimize, stats
from scipy.stats import chi2, logistic, norm

# Plotly rendering (CKC convention)
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

# Reproducibility
rng = np.random.default_rng(7)
np.set_printoptions(precision=4, suppress=True)

print("Python", platform.python_version())
print("NumPy", np.__version__)
print("SciPy", scipy.__version__)
Python 3.12.9
NumPy 1.26.2
SciPy 1.15.0

1) Title & Classification#

  • Name: logistic

  • Type: continuous distribution

  • Support: \(x \in (-\infty, \infty)\)

  • Parameter space: location \(\mu \in \mathbb{R}\) and scale \(s > 0\)

We write:

\[X \sim \mathrm{Logistic}(\mu, s).\]

The standard logistic is \(\mathrm{Logistic}(0,1)\).

SciPy uses the same location/scale form: stats.logistic(loc=mu, scale=s).

2) Intuition & Motivation#

2.1 What it models#

The logistic distribution is a good model for real-valued noise that is:

  • symmetric (centered around \(\mu\))

  • unimodal (single peak at \(\mu\))

  • heavier-tailed than a normal (but still exponentially decaying)

A practical intuition: compared to a normal distribution with the same variance, logistic puts more probability mass in the tails.

2.2 Typical real-world use cases#

  • Latent-variable view of logistic regression: if a latent score is perturbed by logistic noise and thresholded, the resulting class probability is a sigmoid.

  • Log-odds modeling: if \(P \in (0,1)\) is a random probability, then \(\log\!\left(\frac{P}{1-P}\right)\) lives on \(\mathbb{R}\); logistic is a natural simple choice for such log-odds.

  • Convenient alternative to a normal: similar bell shape, simple CDF/quantile.

  • Mixture models / generative models: mixtures of logistics are used to model complex continuous densities (notably in some neural image models).

2.3 Relations to other distributions#

  • Uniform ↔ logistic (logit link): if \(U\sim\mathrm{Unif}(0,1)\), then $\(\log\!\left(\frac{U}{1-U}\right) \sim \mathrm{Logistic}(0,1).\)\( Conversely, if \)X\sim\mathrm{Logistic}(\mu,s)\( then \)F(X)\( is Uniform\)(0,1)$.

  • Gumbel difference: if \(G_1, G_2\) are i.i.d. Gumbel with the same scale, then \(G_1 - G_2\) is logistic.

  • Normal approximation: matching variances gives $\(\mathrm{Logistic}(0, s) \approx \mathcal{N}(0,1) \quad\text{when}\quad s=\sqrt{3}/\pi\approx 0.5513.\)$

  • Log-logistic: if \(X\sim\mathrm{Logistic}(\mu,s)\), then \(\exp(X)\) is log-logistic.

3) Formal Definition#

Let

\[z = \frac{x-\mu}{s}.\]

3.1 PDF#

Different equivalent forms are useful:

\[f(x\mid\mu,s) = \frac{e^{-z}}{s\,(1+e^{-z})^2} = \frac{1}{s}\,\sigma(z)\bigl(1-\sigma(z)\bigr) = \frac{1}{4s}\,\operatorname{sech}^2\!\left(\frac{z}{2}\right),\]

where \(\sigma(z)=\frac{1}{1+e^{-z}}\).

3.2 CDF#

\[F(x\mid\mu,s) = \sigma\!\left(\frac{x-\mu}{s}\right)=\frac{1}{1+e^{-z}}.\]

3.3 Quantile function (inverse CDF)#

For \(p\in(0,1)\):

\[F^{-1}(p) = \mu + s\,\log\!\left(\frac{p}{1-p}\right).\]

This closed-form inverse CDF makes inverse transform sampling especially simple.

def sigmoid(z: np.ndarray) -> np.ndarray:
    # Stable logistic function σ(z) = 1 / (1 + exp(-z)).

    z = np.asarray(z, dtype=float)
    out = np.empty_like(z)

    pos = z >= 0
    out[pos] = 1.0 / (1.0 + np.exp(-z[pos]))

    ez = np.exp(z[~pos])
    out[~pos] = ez / (1.0 + ez)

    return out


def logistic_cdf(x: np.ndarray, mu: float = 0.0, s: float = 1.0) -> np.ndarray:
    if s <= 0:
        raise ValueError("scale s must be > 0")
    z = (np.asarray(x, dtype=float) - mu) / s
    return sigmoid(z)


def logistic_pdf(x: np.ndarray, mu: float = 0.0, s: float = 1.0) -> np.ndarray:
    if s <= 0:
        raise ValueError("scale s must be > 0")
    z = (np.asarray(x, dtype=float) - mu) / s
    p = sigmoid(z)
    return (p * (1.0 - p)) / s


def logistic_logpdf(x: np.ndarray, mu: float = 0.0, s: float = 1.0) -> np.ndarray:
    # Stable log-PDF using logaddexp:
    # log f(x) = -log s - z - 2 log(1 + exp(-z)), where z=(x-mu)/s.

    if s <= 0:
        raise ValueError("scale s must be > 0")
    z = (np.asarray(x, dtype=float) - mu) / s
    return -np.log(s) - z - 2.0 * np.logaddexp(0.0, -z)


def logistic_ppf(p: np.ndarray, mu: float = 0.0, s: float = 1.0, eps: float = 1e-12) -> np.ndarray:
    if s <= 0:
        raise ValueError("scale s must be > 0")
    p = np.asarray(p, dtype=float)
    p = np.clip(p, eps, 1.0 - eps)
    return mu + s * (np.log(p) - np.log1p(-p))


def logistic_rvs(
    rng: np.random.Generator,
    size: int | tuple[int, ...],
    mu: float = 0.0,
    s: float = 1.0,
) -> np.ndarray:
    # NumPy-only sampling via inverse CDF.

    u = rng.random(size=size)
    return logistic_ppf(u, mu=mu, s=s)


def logistic_moments(mu: float = 0.0, s: float = 1.0) -> dict:
    if s <= 0:
        raise ValueError("scale s must be > 0")

    mean = mu
    var = (np.pi * s) ** 2 / 3.0

    return {
        "mean": mean,
        "variance": var,
        "skewness": 0.0,
        "kurtosis": 4.2,  # non-excess
        "excess_kurtosis": 6.0 / 5.0,
        "median": mu,
        "mode": mu,
    }


def logistic_entropy(s: float = 1.0) -> float:
    if s <= 0:
        raise ValueError("scale s must be > 0")
    return float(np.log(s) + 2.0)


def logistic_mgf(t: np.ndarray, mu: float = 0.0, s: float = 1.0) -> np.ndarray:
    # MGF M_X(t) = E[e^{tX}] for |t| < 1/s.

    if s <= 0:
        raise ValueError("scale s must be > 0")

    t = np.asarray(t, dtype=float)
    x = np.pi * s * t

    out = np.full_like(t, np.nan, dtype=float)
    ok = np.abs(t) < (1.0 / s)

    ratio = np.empty_like(x)
    small = np.abs(x) < 1e-4
    ratio[small] = 1.0 + (x[small] ** 2) / 6.0 + 7.0 * (x[small] ** 4) / 360.0
    ratio[~small] = x[~small] / np.sin(x[~small])

    out[ok] = np.exp(mu * t[ok]) * ratio[ok]
    return out


def logistic_cf(t: np.ndarray, mu: float = 0.0, s: float = 1.0) -> np.ndarray:
    # Characteristic function φ_X(t) = E[e^{itX}] for real t.

    if s <= 0:
        raise ValueError("scale s must be > 0")

    t = np.asarray(t, dtype=float)
    x = np.pi * s * t

    ratio = np.empty_like(x)
    small = np.abs(x) < 1e-4
    ratio[small] = 1.0 - (x[small] ** 2) / 6.0 + 7.0 * (x[small] ** 4) / 360.0
    ratio[~small] = x[~small] / np.sinh(x[~small])

    return np.exp(1j * mu * t) * ratio

4) Moments & Properties#

Let \(X\sim\mathrm{Logistic}(\mu,s)\).

4.1 Mean, variance, skewness, kurtosis#

  • Mean: \(\mathbb{E}[X] = \mu\).

  • Variance: \(\mathrm{Var}(X) = \dfrac{\pi^2 s^2}{3}\).

  • Skewness: \(0\) (symmetry).

  • Kurtosis: \(4.2\) (so excess kurtosis is \(6/5=1.2\)).

Also:

  • Median: \(\mu\).

  • Mode: \(\mu\).

4.2 MGF and characteristic function#

The MGF exists only on a strip around 0 (because tails are exponential):

\[M_X(t)=\mathbb{E}[e^{tX}] = e^{\mu t}\,\frac{\pi s t}{\sin(\pi s t)},\qquad |t|<\frac{1}{s}.\]

The characteristic function exists for all real \(t\):

\[\varphi_X(t)=\mathbb{E}[e^{itX}] = e^{i\mu t}\,\frac{\pi s t}{\sinh(\pi s t)}.\]

4.3 Differential entropy#

The logistic distribution has a simple differential entropy:

\[h(X) = \ln(s) + 2.\]

4.4 Tail behavior#

For large \(|x|\), the logistic density behaves like

\[f(x) \approx \frac{1}{s}e^{-|x-\mu|/s},\]

so it has exponential tails (heavier than Gaussian, lighter than power-law tails).

# Quick numerical checks: moments + MGF (Monte Carlo)
mu0, s0 = 0.7, 1.3
n = 200_000

samples = logistic_rvs(rng, size=n, mu=mu0, s=s0)

mom = logistic_moments(mu=mu0, s=s0)
mean_mc = samples.mean()
var_mc = samples.var(ddof=0)

skew_mc = stats.skew(samples)
kurt_mc = stats.kurtosis(samples, fisher=False)  # non-excess

mom, mean_mc, var_mc, skew_mc, kurt_mc
({'mean': 0.7,
  'variance': 5.559877145947005,
  'skewness': 0.0,
  'kurtosis': 4.2,
  'excess_kurtosis': 1.2,
  'median': 0.7,
  'mode': 0.7},
 0.703193870225651,
 5.502046897613281,
 -0.004496053448897028,
 4.170427457166342)
# MGF check for a few t in the valid range |t| < 1/s
# (Monte Carlo estimate: mean(exp(tX)))

ts = np.array([-0.4, -0.2, 0.2, 0.4]) / s0  # safely within (-1/s, 1/s)

mgf_theory = logistic_mgf(ts, mu=mu0, s=s0)
mgf_mc = np.array([np.mean(np.exp(t * samples)) for t in ts])

np.column_stack([ts, mgf_theory, mgf_mc])
array([[-0.3077,  1.0653,  1.0606],
       [-0.1538,  0.9598,  0.9587],
       [ 0.1538,  1.1905,  1.1902],
       [ 0.3077,  1.6389,  1.6343]])

5) Parameter Interpretation#

5.1 Meaning of the parameters#

  • Location \(\mu\) shifts the distribution left/right.

    • mean = median = mode = \(\mu\)

  • Scale \(s\) stretches/compresses the distribution.

    • standard deviation: \(\sigma = \dfrac{\pi s}{\sqrt{3}}\)

    • interquartile range (IQR): $\(\mathrm{IQR} = F^{-1}(0.75)-F^{-1}(0.25)=2s\log 3.\)$

5.2 Shape changes#

  • Increasing \(s\) makes the density wider and the peak lower.

  • Decreasing \(s\) concentrates mass more tightly around \(\mu\).

Because this is a location–scale family, changing \((\mu,s)\) never changes the fundamental shape; it only shifts and rescales it.

# Useful scale relationships

def logistic_sd(s: float) -> float:
    return float(np.pi * s / np.sqrt(3.0))


def logistic_iqr(s: float) -> float:
    return float(2.0 * s * np.log(3.0))

for s in [0.5, 1.0, 2.0]:
    print(f"s={s:>4}: sd={logistic_sd(s):.4f}, IQR={logistic_iqr(s):.4f}")
s= 0.5: sd=0.9069, IQR=1.0986
s= 1.0: sd=1.8138, IQR=2.1972
s= 2.0: sd=3.6276, IQR=4.3944

6) Derivations#

6.1 Expectation#

A very convenient representation comes from inverse-CDF sampling. If \(U\sim\mathrm{Unif}(0,1)\) then

\[X = \mu + s\,\log\!\left(\frac{U}{1-U}\right).\]

So

\[\mathbb{E}[X]=\mu + s\,\mathbb{E}\left[\log\!\left(\frac{U}{1-U}\right)\right].\]

But the integrand is antisymmetric around \(1/2\):

(5)#\[\begin{align} \mathbb{E}\left[\log\!\left(\frac{U}{1-U}\right)\right] &=\int_0^1 \log\!\left(\frac{u}{1-u}\right)\,du \\ &= -\int_0^1 \log\!\left(\frac{u}{1-u}\right)\,du \quad (u\mapsto 1-u), \end{align}\]

so the integral must be \(0\). Therefore \(\mathbb{E}[X]=\mu\).

6.2 MGF and variance#

Let \(Z\sim\mathrm{Logistic}(0,1)\) with CDF \(F(z)=\sigma(z)\). Use the substitution \(u=F(z)\). Because \(du=f(z)\,dz\), we get

(6)#\[\begin{align} M_Z(t) &=\int_{-\infty}^{\infty} e^{tz} f(z)\,dz \\ &=\int_0^1 \exp\left(t\log\!\left(\frac{u}{1-u}\right)\right)\,du \\ &=\int_0^1 u^t (1-u)^{-t}\,du \\ &= B(1+t,1-t) = \Gamma(1+t)\Gamma(1-t). \end{align}\]

This integral is finite only if \(t\in(-1,1)\). Using the reflection identity \(\Gamma(1+t)\Gamma(1-t)=\dfrac{\pi t}{\sin(\pi t)}\), we obtain

\[M_Z(t)=\frac{\pi t}{\sin(\pi t)},\qquad |t|<1.\]

For a general location–scale transform \(X=\mu+sZ\),

\[M_X(t)=e^{\mu t}M_Z(st)=e^{\mu t}\,\frac{\pi s t}{\sin(\pi s t)},\qquad |t|<\frac{1}{s}.\]

To get the variance, expand around \(t=0\). Using

\[\frac{x}{\sin x} = 1 + \frac{x^2}{6} + O(x^4),\]

we get

(7)#\[\begin{align} M_X(t) &= e^{\mu t}\left(1 + \frac{(\pi s t)^2}{6} + O(t^4)\right) \\ &= 1 + \mu t + \left(\frac{\mu^2}{2} + \frac{\pi^2 s^2}{6}\right)t^2 + O(t^3). \end{align}\]

So \(\mathbb{E}[X]=M_X'(0)=\mu\) and \(\mathbb{E}[X^2]=M_X''(0)=\mu^2+\dfrac{\pi^2 s^2}{3}\). Therefore

\[\mathrm{Var}(X)=\mathbb{E}[X^2]-\mathbb{E}[X]^2=\frac{\pi^2 s^2}{3}.\]

6.3 Likelihood (iid sample)#

For data \(x_1,\ldots,x_n\) i.i.d. from \(\mathrm{Logistic}(\mu,s)\),

\[L(\mu,s)=\prod_{i=1}^n \frac{e^{-z_i}}{s(1+e^{-z_i})^2},\qquad z_i=\frac{x_i-\mu}{s}.\]

The log-likelihood is

(8)#\[\begin{align} \ell(\mu,s) &=\sum_{i=1}^n \log f(x_i\mid\mu,s)\\ &= -n\log s - \sum_{i=1}^n z_i - 2\sum_{i=1}^n \log(1+e^{-z_i}). \end{align}\]

There is no closed-form MLE in general; it is typically found by numerical optimization.

def logistic_loglik(x: np.ndarray, mu: float, s: float) -> float:
    return float(np.sum(logistic_logpdf(x, mu=mu, s=s)))


def fit_logistic_mle(x: np.ndarray, mu_init: float | None = None, s_init: float | None = None):
    x = np.asarray(x, dtype=float)

    if mu_init is None:
        mu_init = float(np.median(x))
    if s_init is None:
        s_init = float(np.std(x, ddof=0) * np.sqrt(3.0) / np.pi)
        s_init = max(s_init, 1e-3)

    def nll(theta: np.ndarray) -> float:
        mu, log_s = float(theta[0]), float(theta[1])
        s = float(np.exp(log_s))
        return -logistic_loglik(x, mu=mu, s=s)

    res = optimize.minimize(nll, x0=np.array([mu_init, np.log(s_init)]), method="BFGS")
    mu_hat, log_s_hat = res.x
    return {
        "mu_hat": float(mu_hat),
        "s_hat": float(np.exp(log_s_hat)),
        "success": bool(res.success),
        "message": res.message,
        "fun": float(res.fun),
    }


# Compare our simple MLE to SciPy's fit on simulated data
x_data = logistic_rvs(rng, size=5_000, mu=1.2, s=0.8)

ours = fit_logistic_mle(x_data)
scipy_loc, scipy_scale = stats.logistic.fit(x_data)

ours, (scipy_loc, scipy_scale)
({'mu_hat': 1.2258002867931699,
  's_hat': 0.7831215632335,
  'success': True,
  'message': 'Optimization terminated successfully.',
  'fun': 8783.809900252894},
 (1.2258003058433171, 0.7831215768119856))

7) Sampling & Simulation#

7.1 Inverse transform sampling#

Because the logistic CDF is invertible in closed form, we can sample using the inverse CDF.

If \(U\sim\mathrm{Unif}(0,1)\) and \(X=F^{-1}(U)\), then \(X\) has CDF \(F\). For logistic:

\[X = \mu + s\,\log\!\left(\frac{U}{1-U}\right).\]

7.2 Practical notes#

  • When implementing \(\log\!\left(\frac{U}{1-U}\right)\) numerically, use $\(\log U - \log(1-U)\)$ with log1p for stability.

  • Clip \(U\) away from exactly 0 and 1 to avoid returning \(\pm\infty\).

Algorithm (vectorized)

  1. Draw \(u \leftarrow \mathrm{Uniform}(0,1)\)

  2. Set \(u \leftarrow \mathrm{clip}(u,\varepsilon, 1-\varepsilon)\)

  3. Return \(x \leftarrow \mu + s(\log u - \log(1-u))\)

# Sampling sanity checks
mu0, s0 = -0.5, 1.7

x = logistic_rvs(rng, size=200_000, mu=mu0, s=s0)

# 1) Mean/variance
print('mean (mc)', x.mean(), 'theory', logistic_moments(mu0, s0)['mean'])
print('var  (mc)', x.var(ddof=0), 'theory', logistic_moments(mu0, s0)['variance'])

# 2) Probability integral transform: F(X) should look Uniform(0,1)
u = logistic_cdf(x, mu=mu0, s=s0)
print('u mean', u.mean(), 'u var', u.var(ddof=0))

# Compare a few quantiles to Uniform(0,1)
qs = np.array([0.01, 0.1, 0.5, 0.9, 0.99])
print('empirical u-quantiles:', np.quantile(u, qs))
print('target quantiles     :', qs)
mean (mc) -0.5094692359972378 theory -0.5
var  (mc) 9.4866561100536 theory 9.507718906382747
u mean 0.4992795266980339 u var 0.08335766686663254
empirical u-quantiles: [0.0099 0.0994 0.4993 0.8989 0.99  ]
target quantiles     : [0.01 0.1  0.5  0.9  0.99]

8) Visualization#

We’ll visualize:

  • the theoretical PDF and CDF for several parameter choices

  • Monte Carlo samples from the NumPy-only sampler

# PDF/CDF for several parameter choices

params = [
    (0.0, 0.6),
    (0.0, 1.0),
    (0.0, 2.0),
    (2.0, 1.0),
]

# choose an x-range that covers all cases (0.001 to 0.999 quantiles)
lo = min(logistic_ppf(1e-3, mu=mu, s=s) for mu, s in params)
hi = max(logistic_ppf(1 - 1e-3, mu=mu, s=s) for mu, s in params)
xx = np.linspace(lo, hi, 800)

fig = make_subplots(rows=1, cols=2, subplot_titles=("PDF", "CDF"))

for mu, s in params:
    label = f"μ={mu}, s={s}"
    fig.add_trace(go.Scatter(x=xx, y=logistic_pdf(xx, mu=mu, s=s), mode="lines", name=label), row=1, col=1)
    fig.add_trace(go.Scatter(x=xx, y=logistic_cdf(xx, mu=mu, s=s), mode="lines", showlegend=False), row=1, col=2)

fig.update_xaxes(title_text="x", row=1, col=1)
fig.update_xaxes(title_text="x", row=1, col=2)
fig.update_yaxes(title_text="density", row=1, col=1)
fig.update_yaxes(title_text="probability", row=1, col=2)

fig.update_layout(title="Logistic distribution: PDF and CDF", width=950, height=420)
fig.show()
# Monte Carlo histogram + PDF overlay

mu0, s0 = 0.0, 1.0
samples_mc = logistic_rvs(rng, size=80_000, mu=mu0, s=s0)

x_grid = np.linspace(logistic_ppf(1e-4, mu0, s0), logistic_ppf(1 - 1e-4, mu0, s0), 900)

fig = go.Figure()
fig.add_trace(
    go.Histogram(
        x=samples_mc,
        nbinsx=70,
        histnorm="probability density",
        name="Monte Carlo (NumPy-only)",
        opacity=0.55,
    )
)
fig.add_trace(
    go.Scatter(
        x=x_grid,
        y=logistic_pdf(x_grid, mu=mu0, s=s0),
        mode="lines",
        name="True PDF",
        line=dict(width=3),
    )
)

fig.update_layout(title=f"Logistic(μ={mu0}, s={s0}): histogram vs PDF", width=900, height=420)
fig.show()
# CDF: theoretical vs empirical

x_grid = np.linspace(logistic_ppf(1e-4, mu0, s0), logistic_ppf(1 - 1e-4, mu0, s0), 700)

emp_x = np.sort(samples_mc)
emp_cdf = np.arange(1, emp_x.size + 1) / emp_x.size

fig = go.Figure()
fig.add_trace(go.Scatter(x=x_grid, y=logistic_cdf(x_grid, mu=mu0, s=s0), mode="lines", name="True CDF"))
fig.add_trace(
    go.Scatter(
        x=emp_x[::200],
        y=emp_cdf[::200],
        mode="markers",
        name="Empirical CDF (subsampled)",
        marker=dict(size=4, opacity=0.55),
    )
)

fig.update_layout(title=f"Logistic(μ={mu0}, s={s0}): CDF vs empirical", width=900, height=420)
fig.show()

9) SciPy Integration (scipy.stats.logistic)#

SciPy parameterization:

stats.logistic(loc=mu, scale=s)
  • loc is the location parameter \(\mu\).

  • scale is the scale parameter \(s>0\).

SciPy provides:

  • pdf, logpdf, cdf, ppf

  • rvs for sampling

  • fit for MLE

dist = stats.logistic(loc=mu0, scale=s0)

x_test = np.linspace(-2, 2, 5)

pdf = dist.pdf(x_test)
cdf = dist.cdf(x_test)
samples_scipy = dist.rvs(size=5, random_state=rng)

pdf, cdf, samples_scipy
(array([0.105 , 0.1966, 0.25  , 0.1966, 0.105 ]),
 array([0.1192, 0.2689, 0.5   , 0.7311, 0.8808]),
 array([-0.9978,  1.6964, -1.7808, -1.3917, -2.842 ]))
# MLE fit example
true_mu, true_s = 1.5, 0.9
x_fit = stats.logistic(loc=true_mu, scale=true_s).rvs(size=10_000, random_state=rng)

mu_hat, s_hat = stats.logistic.fit(x_fit)  # returns (loc, scale)

true_mu, true_s, mu_hat, s_hat
(1.5, 0.9, 1.5159770568469728, 0.9025679317202033)

10) Statistical Use Cases#

10.1 Hypothesis testing (location)#

If you assume data are logistic with unknown \((\mu,s)\), a common hypothesis is

\[H_0: \mu = \mu_0 \quad \text{vs} \quad H_1: \mu \ne \mu_0.\]

You can use a likelihood-ratio test (LRT):

\[\Lambda = 2\bigl(\ell(\hat\mu,\hat s) - \ell(\mu_0, \tilde s)\bigr) \overset{approx}{\sim} \chi^2_1,\]

where \((\hat\mu,\hat s)\) are the unrestricted MLEs and \(\tilde s\) is the MLE under \(H_0\).

10.2 Bayesian modeling#

  • Error model: logistic noise is a heavy-tailed alternative to Gaussian noise.

  • Latent-variable logistic regression: if \(Y=\mathbf{1}\{\eta+\varepsilon>0\}\) with \(\varepsilon\sim\mathrm{Logistic}(0,1)\), then $\(\Pr(Y=1\mid\eta)=\sigma(\eta).\)$ This gives the familiar logistic likelihood used in Bayesian logistic regression.

10.3 Generative modeling#

  • Inverse-CDF sampling makes logistic a convenient base distribution.

  • Mixtures of logistics can model multimodal or skewed densities and appear in modern neural generative models.

# 10.1 Likelihood-ratio test example: H0: mu = 0

rng_test = np.random.default_rng(123)

n = 400
mu_true, s_true = 0.35, 1.0
x = logistic_rvs(rng_test, size=n, mu=mu_true, s=s_true)


def mle_unrestricted(x: np.ndarray):
    x = np.asarray(x, dtype=float)

    def nll(theta: np.ndarray) -> float:
        mu, log_s = float(theta[0]), float(theta[1])
        s = float(np.exp(log_s))
        return -logistic_loglik(x, mu=mu, s=s)

    mu_init = float(np.median(x))
    s_init = float(np.std(x, ddof=0) * np.sqrt(3.0) / np.pi)

    res = optimize.minimize(nll, x0=np.array([mu_init, np.log(max(s_init, 1e-3))]), method="BFGS")
    mu_hat, log_s_hat = res.x
    return float(mu_hat), float(np.exp(log_s_hat)), float(-res.fun)


def mle_mu_fixed(x: np.ndarray, mu0: float):
    x = np.asarray(x, dtype=float)

    def nll(log_s: np.ndarray) -> float:
        s = float(np.exp(float(log_s)))
        return -logistic_loglik(x, mu=mu0, s=s)

    s_init = float(np.std(x, ddof=0) * np.sqrt(3.0) / np.pi)
    res = optimize.minimize(nll, x0=np.array([np.log(max(s_init, 1e-3))]), method="BFGS")
    s_hat = float(np.exp(float(res.x)))
    return s_hat, float(-res.fun)


mu0 = 0.0
mu_hat, s_hat, ll1 = mle_unrestricted(x)
s_tilde, ll0 = mle_mu_fixed(x, mu0=mu0)

lrt = 2.0 * (ll1 - ll0)
p_value = 1.0 - chi2.cdf(lrt, df=1)

{
    "n": n,
    "true": (mu_true, s_true),
    "mle_unrestricted": (mu_hat, s_hat),
    "mle_H0": (mu0, s_tilde),
    "LRT": lrt,
    "p_value": p_value,
}
/tmp/ipykernel_1031270/2293021024.py:30: DeprecationWarning:

Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)

/tmp/ipykernel_1031270/2293021024.py:35: DeprecationWarning:

Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
{'n': 400,
 'true': (0.35, 1.0),
 'mle_unrestricted': (0.26463042446936813, 0.9882888911189841),
 'mle_H0': (0.0, 0.9992910134947958),
 'LRT': 9.410232061566376,
 'p_value': 0.0021577791606112173}
# 10.2 Bayesian example: posterior over mu with known scale (grid approximation)

x = logistic_rvs(rng, size=200, mu=0.6, s=1.0)
s_known = 1.0

# Prior: mu ~ Normal(0, 2^2)
mu_grid = np.linspace(-2.5, 2.5, 1201)
log_prior = norm(loc=0.0, scale=2.0).logpdf(mu_grid)

# Log-likelihood for each mu on the grid
log_like = np.array([logistic_loglik(x, mu=mu, s=s_known) for mu in mu_grid])
log_post_unnorm = log_prior + log_like
log_post = log_post_unnorm - np.max(log_post_unnorm)
post = np.exp(log_post)
post /= post.sum()

post_mean = float(np.sum(mu_grid * post))
post_cdf = np.cumsum(post)
ci_low = float(mu_grid[np.searchsorted(post_cdf, 0.025)])
ci_high = float(mu_grid[np.searchsorted(post_cdf, 0.975)])

(post_mean, (ci_low, ci_high))
(0.5288318801511985, (0.2875000000000001, 0.7708333333333335))
# Visualize the posterior

fig = go.Figure()
fig.add_trace(go.Scatter(x=mu_grid, y=post, mode="lines", name="posterior"))
fig.add_vline(x=post_mean, line_dash="dash", line_color="black", annotation_text="posterior mean")
fig.add_vrect(x0=ci_low, x1=ci_high, fillcolor="gray", opacity=0.2, line_width=0)

fig.update_layout(
    title="Posterior over μ (known s): grid approximation",
    xaxis_title="μ",
    yaxis_title="posterior density (discrete grid)",
    width=900,
    height=420,
)
fig.show()
# 10.3 Generative modeling: a simple mixture of logistics

weights = np.array([0.55, 0.45])
components = [(-1.2, 0.6), (1.4, 0.9)]  # (mu, s)


def mixture_logistic_pdf(x: np.ndarray) -> np.ndarray:
    x = np.asarray(x, dtype=float)
    out = np.zeros_like(x)
    for w, (mu, s) in zip(weights, components):
        out += w * logistic_pdf(x, mu=mu, s=s)
    return out


def mixture_logistic_rvs(rng: np.random.Generator, size: int) -> np.ndarray:
    k = rng.choice(len(weights), size=size, p=weights)
    out = np.empty(size, dtype=float)
    for idx in range(len(weights)):
        mask = k == idx
        mu, s = components[idx]
        out[mask] = logistic_rvs(rng, size=int(mask.sum()), mu=mu, s=s)
    return out


mix_samples = mixture_logistic_rvs(rng, size=60_000)

x_grid = np.linspace(np.quantile(mix_samples, 0.001), np.quantile(mix_samples, 0.999), 900)

fig = go.Figure()
fig.add_trace(
    go.Histogram(
        x=mix_samples,
        nbinsx=90,
        histnorm="probability density",
        name="samples",
        opacity=0.55,
    )
)
fig.add_trace(go.Scatter(x=x_grid, y=mixture_logistic_pdf(x_grid), mode="lines", name="mixture PDF", line=dict(width=3)))

fig.update_layout(title="Mixture of logistics: histogram vs PDF", width=900, height=420)
fig.show()

11) Pitfalls#

  • Invalid scale: \(s\le 0\) is not a valid logistic distribution.

  • Overflow in naive formulas:

    • np.exp(-z) overflows if \(z\) is very negative.

    • use stable forms (piecewise sigmoid, logaddexp, log1p).

  • Sampling at the boundaries:

    • the inverse CDF uses \(\log\!\left(\frac{p}{1-p}\right)\); if \(p\) is exactly 0 or 1, you get \(\pm\infty\).

    • clip \(p\) (or the underlying uniform draws) away from {0,1}.

  • MGF domain:

    • \(M_X(t)\) exists only for \(|t|<1/s\).

  • Parameterization confusion:

    • some sources parameterize logistic by a “steepness” \(k=1/s\).

    • SciPy uses (loc, scale).

  • Fitting:

    • for small samples, MLE can be noisy; prefer robust starting points (median + variance-based scale).

12) Summary#

  • logistic is a continuous distribution on \(\mathbb{R}\) with CDF equal to the sigmoid.

  • Parameters: location \(\mu\in\mathbb{R}\) and scale \(s>0\) (a pure shift/scale family).

  • Key formulas:

    • \(\mathbb{E}[X]=\mu\),

    • \(\mathrm{Var}(X)=\pi^2 s^2/3\),

    • \(h(X)=\ln(s)+2\),

    • \(M_X(t)=e^{\mu t}\,\pi s t/\sin(\pi s t)\) for \(|t|<1/s\).

  • Sampling is easy via inverse CDF: \(\mu+s\log\!\left(\frac{U}{1-U}\right)\).

References

  • SciPy documentation: scipy.stats.logistic.

  • Reflection identity: \(\Gamma(z)\Gamma(1-z)=\pi/\sin(\pi z)\).

  • Mixture of logistics in neural generative modeling: PixelCNN++ (Salimans et al., 2017) uses discretized logistic mixtures.